Online Variance Reduction for Stochastic Optimization

نویسندگان

  • Zalán Borsos
  • Andreas Krause
  • Kfir Y. Levy
چکیده

Modern stochastic optimization methods often rely on uniform sampling which is agnostic to the underlying characteristics of the data. This might degrade the convergence by yielding estimates that suffer from a high variance. A possible remedy is to employ non-uniform importance sampling techniques, which take the structure of the dataset into account. In this work, we investigate a recently proposed setting which poses variance reduction as an online optimization problem with bandit feedback. We devise a novel and efficient algorithm for this setting that finds a sequence of importance sampling distributions competitive with the best fixed distribution in hindsight, the first result of this kind. While we present our method for sampling datapoints, it naturally extends to selecting coordinates or even blocks of thereof. Empirical validations underline the benefits of our method in several settings.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Variance Reduction for Stochastic Gradient Optimization

Stochastic gradient optimization is a class of widely used algorithms for training machine learning models. To optimize an objective, it uses the noisy gradient computed from the random data samples instead of the true gradient computed from the entire dataset. However, when the variance of the noisy gradient is large, the algorithm might spend much time bouncing around, leading to slower conve...

متن کامل

Randomized Block Coordinate Descent for Online and Stochastic Optimization

Two types of low cost-per-iteration gradient descent methods have been extensively studied in parallel. One is online or stochastic gradient descent ( OGD/SGD), and the other is randomzied coordinate descent (RBCD). In this paper, we combine the two types of methods together and propose online randomized block coordinate descent (ORBCD). At each iteration, ORBCD only computes the partial gradie...

متن کامل

Accelerated Stochastic Power Iteration

Principal component analysis (PCA) is one of the most powerful tools in machine learning. The simplest method for PCA, the power iteration, requires O(1/∆) full-data passes to recover the principal component of a matrix with eigen-gap ∆. Lanczos, a significantly more complex method, achieves an accelerated rate of O(1/ √ ∆) passes. Modern applications, however, motivate methods that only ingest...

متن کامل

Online Variance-reducing Optimization

We emphasize the importance of variance reduction in stochastic methods and propose a probabilistic interpretation as a way to store information about past gradients. The resulting algorithm is very similar to the momentum method, with the difference that the weight over past gradients depends on the distance moved in parameter space rather than the number of steps.

متن کامل

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure

Stochastic optimization algorithms with variance reduction have proven successful for minimizing large finite sums of functions. However, in the context of empirical risk minimization, it is often helpful to augment the training set by considering random perturbations of input examples. In this case, the objective is no longer a finite sum, and the main candidate for optimization is the stochas...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1802.04715  شماره 

صفحات  -

تاریخ انتشار 2018